Search CORE

16 research outputs found

A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

Author: Dingare Shipra
Finkel Jenny
Grover Claire
Manning Christopher
Nissim Malvina
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2005
Field of study

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Using the NITE XML Toolkit on the Switchboard Corpus to study syntactic choice: a case study

Author: Carletta Jean
Dingare Shipra
Nikitina Tatiana
Nissim Malvina
Publication venue
Publication date: 01/05/2004
Field of study

Edinburgh Research Explorer

An annotation scheme for information status in dialogue

Author: Carletta Jean
Dingare Shipra
Nissim Malvina
Steedman Mark
Publication venue
Publication date: 01/01/2004
Field of study

We present an annotation scheme for information status (IS) in dialogue, and validate it on three Switchboard dialogues. We show that our scheme has good reproducibility, and compare it with previous attempts to code IS and related features. We eventually apply the scheme to 147 dialogues, thus producing a corpus that contains nearly 70,000 NPs annotated for IS and over 15,000 coreference links.

CiteSeerX

Edinburgh Research Explorer

Exploring the boundaries: gene and protein identification in biomedical text

Author: Beatrice Alex
Bmc Bioinformatics
Christopher D Manning
Claire Grover
Jenny Finkel
Malvina Nissim
Shipra Dingare
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the “open ” evaluation and a precision of 0.78 and recall of 0.85 in the “closed ” evaluation. Conclusions: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. Background The explosion of information in the biomedical domain and particularly in genetics has highlighted the need for automated text information extraction techniques. MEDLINE, the primary research database serving the biomedical community, currently contains over 14 million abstracts, with 60,000 new abstracts appearing each month. There is also an impressive number of molecular biological databases covering a

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Domain-specific Web site identification: the CROSSMARC focused Web crawler

Author: Curran James
Dingare Shipra
Grover Claire
Horlock James
Karkaletsis Vangelis
Paliouras Georgios
Stamatakis Konstantinos
Publication venue
Publication date: 01/01/2003
Field of study

Edinburgh Research Explorer

Information Retrieval Systems Adapted to the Biomedical Domain

Author: Ai Kawazoe
Alexa McCray
Alexander Morgan
Christian Jacquemin
Cohen Aaron
Cornelius Rosse
GuoDong Zhou
Hamish Cunningham
Holger Stenzhorn
Hongfang Liu
Irena Spasic
Larisa Soldatova
Matthias Samwald
Poibeau Thierry
Ricardo Baeza-Yates
Robert Gaizauskas
Shipra Dingare
Sophia Ananiadou
Steffen Schulze-Kremer
Ulf Leser
Yoshimasa Tsuruoka
Publication venue: 'Ediciones Profesionales de la Informacion SL'
Publication date: 01/05/2010
Field of study

The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.Comment: 6 pages, 4 table

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Temaria - Revistas digitales de biblioteconomía y documentación

Universidad Carlos III de Madrid e-Archivo

1 The Effect of Feature Hierarchies on Frequencies of Passivization in English

Author: Shipra Dingare
Publication venue
Publication date
Field of study

It is my pleasure to thank Joan Bresnan, my principal advisor, for making this thesis seem manageable, and for pointing me in directions for research, and for consistently being enthusiastic and suggesting solutions to problem after problem, and equally, for suggesting problems and keeping me aware of the issues involved. And particularly for always making me feel more comfortable, in classes and meetings and a number of other situations, to express myself freely, and for handling it gracefully when I was incoherent. I was lucky to be able to work with her and it was a unique experience watching her work. Thanks to Chris Manning, my second advisor, for being helpful in a number of ways despite the incredible demands on his time this year, particularly in providing advice with statistics, and helping me to search corpora, and setting up software for searching them, and for reading what I wrote critically, and in general for being down-to-earth. Thanks to both Chris and Joan for giving me the opportunity to work on such an interesting project. I have really enjoyed our meetings and watching the ideas in the project take shape. Thanks especially for a wonderful trip to Hong Kong. And thanks to Tom Wasow for helping me get involved with the project and for teaching, with Ivan Sag, an introductory syntax class which I very much liked. A very special thanks to Ivan Sag, who is almost entirely responsible, along with Tom Wasow, fo

CiteSeerX

Soft Constraints Mirror Hard Constraints: Voice and Person in English and Lummi

Author: Christopher D. Manning
Joan Bresnan
Shipra Dingare
Publication venue: Publications
Publication date: 01/01/2001
Field of study

The same categorical phenomena which are attributed to hard grammatical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. The effects of a hierarchy of person (1st, 2nd 3rd) on grammar are categorical in some languages, most famously in languages with inverse systems, but also in languages with person restrictions on passivization. In Lummi, for example, the person of the subject argument cannot be lower than the person of a nonsubject argument. If this would happen in the active, passivization is obligatory; if it would happen in the passive, the active is obligatory (Jelinek and Demers 1983). These facts follow from the theory of harmonic alignment in OT: constraints favoring the harmonic association of prominent person (1st, 2nd) with prominent syntactic function (subject) are hypothesized to be present as subhierarchies of the grammars of all languages, but to vary ..

CiteSeerX

A system for identifying named entities in biomedical text: How results from two evaluations reflect on both the system and the evaluations. In The 2004 BioLink meeting at ISMB

Author: Christopher Manning
Claire Grover
Jenny Finkel
Malvina Nissim
Shipra Dingare
Publication venue
Publication date
Field of study

jrfinkel

CiteSeerX